Quotation Extraction for Portuguese
نویسندگان
چکیده
Quotation extraction consists of identifying quotations and their authors. In this work, we present a Quotation Extraction system for Portuguese that is based on Entropy Guided Transformation Learning, a supervised Machine Learning algorithm. This is the first system that uses a Machine Learning approach for Portuguese. In order to train and evaluate the proposed system, we build the GLOBOQUOTES corpus, with news extracted from the GLOBO.COM portal. Our system obtains an Fβ=1 score of 79.02% for the subtask of associating a quotation to its author. For the whole Quotation Extraction task, the observed Fβ=1 score value is 66.03%. These findings indicate that the overall extraction quality is highly dependant on the quotation identification subtask.
منابع مشابه
QUEMDISSE? Reported speech in Portuguese
This paper presents some work on direct and indirect speech in Portuguese using corpus-based methods: we report on a study whose aim was to identify (i) Portuguese verbs used to introduce reported speech and (ii) syntactic patterns used to convey reported speech, in order to enhance the performance of a quotation extraction system, dubbed QUEMDISSE?. In addition, (iii) we present a Portuguese c...
متن کاملA Lexicon of French Quotation Verbs for Automatic Quotation Extraction
Quotation extraction is an important information extraction task, especially when dealing with news wires. Quotations can be found in various configurations. In this paper, we focus on direct quotations introduced by a parenthetical clause, headed by a “quotation verb”. Our study is based on a large French news wire corpus from the Agence France-Presse. We introduce and motivate an analysis at ...
متن کاملExtraction of Unmarked Quotations in Newspapers A Study Based on Direct Speech Extraction Systems
This paper presents work in progress to automatically extract quotation sentences from newspaper articles. The focus is the extraction and annotation of unmarked quotation sentences. A linguistic study shows that unmarked quotation sentences can be formalised into 16 patterns that can be used to develop an extraction grammar. The question of unmarked quotation boundaries identification is also ...
متن کاملAutomatically Detecting and Attributing Indirect Quotations
Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation ex...
متن کاملAn Apparel Trade quotation Architecture Based on BPM and SOA
Based on the analysis of problems and difficulties in apparel quotation system, this paper puts forward the combination of BPM and SOA as a new idea for analysis of apparel quotation system, according to their advantages in business goals and requirements analysis, and the corresponding services’ definition, extraction, optimization and integration. Through the combination, system flexibility, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011